error distribution
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom (0.04)
- Health & Medicine > Therapeutic Area > Neurology (0.67)
- Health & Medicine > Health Care Technology (0.46)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Automated Classification of Model Errors on ImageNet
While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress. To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist.Recent work in this direction employed a panel of experts to manually categorize all remaining classification errors for two selected models. However, this process is time-consuming, prone to inconsistencies, and requires trained experts, making it unsuitable for regular model evaluation thus limiting its utility. To overcome these limitations, we propose the first automated error classification framework, a valuable tool to study how modeling choices affect error distributions. We use our framework to comprehensively evaluate the error distribution of over 900 models. Perhaps surprisingly, we find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the of all error types.
Approximate Multiplier Induced Error Propagation in Deep Neural Networks
Alahakoon, A. M. H. H., Saadat, Hassaan, Jayasinghe, Darshana, Parameswaran, Sri
Deep Neural Networks (DNNs) rely heavily on dense arithmetic operations, motivating the use of Approximate Multipliers (AxMs) to reduce energy consumption in hardware accelerators. However, a rigorous mathematical characterization of how AxMs error distributions influence DNN accuracy remains underdeveloped. This work presents an analytical framework that connects the statistical error moments of an AxM to the induced distortion in General Matrix Multiplication (GEMM). Using the Frobenius norm of the resulting error matrix, we derive a closed form expression for practical DNN dimensions that demonstrates the distortion is predominantly governed by the multiplier mean error (bias). To evaluate this model in realistic settings, we incorporate controlled error injection into GEMM and convolution layers and examine its effect on ImageNet scale networks. The predicted distortion correlates strongly with the observed accuracy degradation, and an error configurable AxM case study implemented on an FPGA further confirms the analytical trends. By providing a lightweight alternative to behavioral or hardware level simulations, this framework enables rapid estimation of AxM impact on DNN inference quality.
- Oceania > Australia > New South Wales > Sydney (0.05)
- Europe (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
Synthetic Error Injection Fails to Elicit Self-Correction In Language Models
Wu, David X., Kapur, Shreyas, Sahai, Anant, Russell, Stuart
Reinforcement learning has become the dominant paradigm for eliciting reasoning and self-correction capabilities in large language models, but its computational expense motivates exploration of alternatives. Inspired by techniques from autonomous driving and robotics, we investigate whether supervised learning with synthetic error injection can induce self-correction abilities in language models. Our approach inserts artificial errors into reasoning chains, masks them, and supervises the model to recognize and correct these mistakes. Despite the intuitive appeal of this method, we find that it fails to significantly improve performance even on simple synthetic tasks across multiple models. Moreover, even when the model catches its own error, it often parrots the original mistake. We find that the distribution shift of synthetic errors to on-policy errors significantly degrades the error-correction capabilities of the fine-tuned model, even with good synthetic coverage of on-policy errors. Our results help explain why on-policy reinforcement learning methods have proven uniquely effective for eliciting self-correction.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Assessing model error in counterfactual worlds
Howerton, Emily, Lessler, Justin
Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.
- North America > United States > North Carolina (0.04)
- Europe > Spain (0.04)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Testing-driven Variable Selection in Bayesian Modal Regression
Duan, Jiasong, Zhang, Hongmei, Huang, Xianzheng
We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- North America > United States > Tennessee > Shelby County > Memphis (0.04)
- Europe > United Kingdom > England > Isle of Wight (0.04)
- Health & Medicine > Therapeutic Area > Hematology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Neural Networks for Censored Expectile Regression Based on Data Augmentation
Expectile regression neural networks (ERNNs) are powerful tools for capturing heterogeneity and complex nonlinear structures in data. However, most existing research has primarily focused on fully observed data, with limited attention paid to scenarios involving censored observations. In this paper, we propose a data augmentation-based ERNNs algorithm, termed DAERNN, for modeling heterogeneous censored data. The proposed DAERNN is fully data-driven, requires minimal assumptions, and offers substantial flexibility. Simulation studies and real-data applications demonstrate that DAERNN outperforms existing censored ERNNs methods and achieves predictive performance comparable to models trained on fully observed data. Moreover, the algorithm provides a unified framework for handling various censoring mechanisms without requiring explicit parametric model specification, thereby enhancing its applicability to practical censored data analysis. Introduction Expectile regression (ER), first proposed by Newey and Powell (1987), has been extensively studied as a flexible alternative to quantile regression (QR) for modeling heterogeneous distributions.
- Law > Civil Rights & Constitutional Law (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Energy (1.00)